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FIELD 

[0001] Various embodiments described below relate generally to multi-party 
conferencing and to audio signal source discrimination and, more particularly but not 
exclusively to, methods and apparatus for indicating the source of information in a multi- 
part conference and to methods and apparatus for discriminating between audio signal 
sources having different spectral characteristics. 

BACKGROUND 

[0002] Teleconferencing is a well-established method of communication 
between parties at different locations. Typically, the conference is held using standard 
telephone services and equipment to enable participants to speak to each other. A 
participant may use a speakerphone for greater comfort or to allow the participant to use 
both hands for other tasks (e.g., taking notes, handle materials being discussed in the 
teleconference, etc.). One of the shortcomings of traditional teleconferencing is that 
participants may not know which of the various participants is speaking at any given time 
during the teleconference. Because normal telephone service band-limits the connection, 
this speaker discrimination problem can be exacerbated. Even if the telephone service is 
not band-limited, speech transmitted over the connection (and/or emitted from a 
telephone speaker) has other characteristics that are different from live speech). 

SUMMARY 

[0003] In accordance with aspects of the various described embodiments, a 
method and system to indicate which participant or participants are providing information 
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during a multi-party conference is provided. In one aspect, each participant has 
equipment (e.g., personal computers, personal digital assistants (PDAs) or other 
computing devices) to display information being transferred during the multi-party 
conference. In some circumstances, the identity of the participant providing the 
information is not apparent to the other participants. 

[0004] This aspect incorporates a sourcing signaler and a source indicator in the 
participant equipment. The sourcing signaler provides a signal that indicates the identity 
of a participant providing information to the multi-party conference to be sent to the other 
participants. The source indicators of the other participant equipment receives the signal 
and in response, causes a user interface (UI) displayed by the participant equipment to 
provide an indication that the participant identified by the received signal is providing 
information. In some embodiments, the UI causes an identifier of the participant to 
change appearance (e.g., causing the identifier to blink or flash, animate, change color or 
size, etc.) in a noticeable manner so that participant viewing the UI can easily know 
which participant is providing the information. This aspect can be advantageously used 
in web conferencing applications in which participants may discuss material displayed by 
the UI via a teleconference. When a participant is speaking on the telephone, this aspect 
can cause the participant's name or other identifier to change appearance as described 
above. 

[0005] In accordance with other aspects of the various described embodiments, 
a method and system to discriminate between sources of an audio signal is provided. In 
one of these other aspects, an audio discriminator is used to distinguish between an 
acoustic signal that was generated by a person speaking from an acoustic signal generated 
in a band-limited manner (e.g., the acoustic output signal from a speakerphone). In one 
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example application, the audio discriminator can be incorporated in the participant 
equipment described above so that the sourcing signaler residing in the participant 
equipment can automatically detect when its participant is speaking and avoid 
erroneously sending the signal in response to another participant's voice coming over a 
speakerphone. 

[0006] In one of these other aspects, the audio discriminator analyzes the 
spectrum of detected audio signals and generates several parameters from the spectrum 
and from past determinations to determine the source of an audio signal. In one 
implementation, a finite state machine uses these parameters to determine the source of 
an audio signal on a frame-by-frame basis. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0007] Non-limiting and non-exhaustive embodiments are described with 
reference to the following figures, wherein like reference numerals refer to like parts 
throughout the various views unless otherwise specified. 

[0008] FIG. 1 is a block diagram illustrating a system for supporting multi- 
party conferencing with information source indication, according to one embodiment; 

[0009] FIGS. 2 and 2 A are diagrams illustrating an example user interface (UI) 
that can indicate when a participant in a multi-party conference is providing information 
during the multi-party conference, according to one embodiment; 

[0010] FIG. 3 is a flow diagram illustrating operational flow of the system of 
FIG. 1, according to one embodiment; 

[0011] FIG. 4 is a block diagram illustrating a system for source discrimination 
of an audio signal, according to one embodiment; 
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[0012] FIGS. 5A-5C are diagrams illustrating spectral information of various 
audio signals; 

[0013] FIG. 6 is a flow diagram illustrating operational flow of the system of 
FIG. 4, according to one embodiment; 

[0014] FIG. 7 is a block diagram illustrating an audio source discriminator of 
FIG. 4, according to one embodiment; 

[0015] FIG. 8 is a flow diagram illustrating operational flow of the audio 
source discriminator of FIG. 7, according to one embodiment; 

[0016] FIG. 9 is a block diagram illustrating parameters generated by and/or 
used by the frame classifier of FIG. 7, according to one embodiment; 

[0017] FIG. 10 is a flow diagram illustrating operational flow of the frame 
classifier of FIG. 7, according to one embodiment; 

[0018] FIG. 11 is a flow diagram illustrating operational flow in determining 
whether a frame is of live speech, according to an alternative embodiment; 

[0019] FIGS. 11A and 11B are diagrams illustrating simplified examples of 
spectrum and timing data of live speech and external noise; 

[0020] FIG. 12 is a state diagram illustrating the audio source finite state 
machine (FSM) of FIG. 7, according to one embodiment; 

[0021] FIG. 13 is a diagram schematically illustrating parameters used by the 
audio source FSM of FIG. 12 in determining its next state, according to one embodiment; 

[0022] FIG. 14 is a flow diagram illustrating operational flow in the audio 
source FSM of FIG. 12 in determining its next state from the phone state; 

[0023] FIG. 15 is a flow diagram illustrating operational flow in the audio 
source FSM of FIG. 12 in determining its next state from the live state; 
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[0024] FIG. 16 is a flow diagram illustrating operational flow in the audio 
source FSM of FIG. 12 in determining its next state from the unsure state; and 

[0025] FIG. 17 is a block diagram illustrating an example computing 
environment suitable for practicing the above embodiments. 

DETAILED DESCRIPTION 

[0026] FIG. 1 illustrates a system 100 that supports multi-party conferencing 
with information source indication, according to one embodiment. In this embodiment, 
system 100 includes a network 101 to which N participants can communicate with each 
other (where N is an integer greater than two in a typical embodiment). The network can 
be any suitable communication network such as, for example, the Internet, a local area 
network (LAN), a campus area network, a virtual private network (VPN), etc. Further, 
the network may operate in a client-server mode or a peer-to-peer mode. 

[0027] The N participants, in this embodiment, have participant equipment 
(PE) 102! through PE 102 N . In addition, PEs 102 r 102 N respectively include network 
interfaces 104 r 104 N , sourcing signalers 106 r 106 N , user interfaces (UIs) 108 r 108 N . 
UIs 108 r 108 N respectively include information source indicators llOrllON. In this 
embodiment, PEs 102 r 102 N are implemented using conventional, commercially- 
available personal computers. In other embodiments, other suitable computing devices 
can be used to implement the PEs. In addition, in this embodiment, PE 102 r 102 N each 
include other communication devices such as, for example, telephones, radios, cameras, 
and/or other audio or video devices, also referred to herein as adjunct devices 1 12 r l 12 N . 

[0028] Further, in this embodiment, network interfaces 104 r 104 N , sourcing 
signalers 106i-106 N , user interfaces (UIs) 108 r 108 N and information source 
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indicators 110i-110n being implemented as software modules or components executed by 
computers in the PEs. 

Example UI Operation Overview 
[0029] Referring again to FIG. 1, each of PEs 102 r 102 N is configured to 
provide, at a real time or near real time basis, an indication of which participant (or^ 
participants) is currently providing information during the multi-party conference. This 
information may be provided via network 101, or through other links using the adjunct 
devices, or through a combination of network 101 and the adjunct devices. For example, 
in some embodiments, PEs 102 r 102 N are used to display visual information (e.g., text, 
spreadsheet, graphic, video, etc.) via network 101, which the participants may then 
verbally discuss using adjunct devices such as telephones. In one embodiment, 
UIs 108 r 108 N display this visual information in a special viewing area of the UI. For 
example, FIG. 2 shows a viewing area 202 x of UI 108i. UIs 108 2 -108 N can have similar 
viewing areas. 

[0030] ReUirning to FIG. 1, information source indicators H0i-H0n are also 
used to display a list of participants via UIs 108 r 108 N . This participant information may 
be shared among the PEs when the multi-party conference is initiated. For example, each 
PE may send a participant's name to the PEs of the other participants of the multi-party 
conference via network 101. In other embodiments, the participant information may be 
different; e.g., a handle or abas, an icon, an avatar, a photo, a video, or other graphic. 
Each of UIs 108 r 108 N can then display the participant information in a special 
participant list of the UI. For example, FIG. 2 shows a participant list 204 L of UI 108! . 
UIs 108 2 -108 N can have similar participant list areas. 
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[0031] Referring to FIGS. 1 and2A, information source indicators HOpl!^ 
can cause UIs 108 r 108k to display an indication of the participant that is currently 
providing information during the multi-party conference. For example, a participant 
named "Tom" is using a PE 102i (which includes a personal computer in this example) to 
participate in a web conference. As shown in FIG. 2A, PE 102 1 displays a graphic 206i 
{e.g., a Microsoft Powerpoint® slide) in viewing area202i. In this example, Tom is 
discussing the graphic with the other participants via a teleconference (i.e., adjunct 112i 
is a telephone). Sourcing signaler 106i provides a sourcing signal to UI 108! via internal 
connection and to PEs 102 2 -102 N via network 101 while Tom is discussing the graphic to 
indicate that Tom is currently providing information (i.e., is speaking) via the telephone 
link. Similarly, sourcing signalers 106 2 -106 N provide a sourcing signal whenever their 
associated participants are providing information. Embodiments for determining when a 
participant is providing information are described below in conjunction with FIGS. 4-16. 

[0032] In substantially real time, information source indicators 110 r 110N 
detect the signal and cause UIs 108 r 108 N to provide an indication that Tom is speaking. 
For example, in this embodiment UI 108i indicates that Tom is speaking by causing the 
name "Tom" in participant list 204! to enlarge and become bolder as shown in FIG. 2A. 
In other embodiments, the name (or graphic, etc.) may flash or blink, change colors, 
become highlighted, etc. More indications are described in conjunction with FIG. 3 
below. Although in this example the information source being indicated is the 
teleconference audio information, in other embodiments, the source of other types of 
information may be indicated. For example, the source of the information being 
displayed in viewing area 202i can be indicated in participant list 204 1. 
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Operation of an Example UI in Indicating an Information Source 

[0033] FIG. 3 illustrates operational flow of PEs 102 r 102 N (FIG. 1) in 
indicating the source of information during a multi-party conference, according to one 
embodiment. For clarity, only the operational flow of PE 102i is described, with the 
operation of PEs 102 2 -102 N being substantially similar. 

[0034] In a block 302, PE 102! obtains a list of participants in the multiparty 
conference. As previously described, this list may be in the form of text (e.g., names, 
aliases, etc.) or in graphical form (e.g., icons, photographs, video, etc.). In one 
embodiment, PE 102! obtains this list via network 101 (FIG. 1). 

[0035] In one embodiment, in joining a multi-party conference, each participant 
provides a name or other identifier to a web-based administrator that coordinates the 
multi-party conference. This administrator can then provide the names/identifiers to the 
other participants joining the multi-party conference. 

[0036] In another embodiment, a participant setting up the multi-party 
conference can send invitations to other parties using a calendar application (e.g., 
Microsoft Outlook®), and then add the identifiers of those parties accepting the invitation 
to the participant list. In some embodiments, the participants are added manually while 
in others the participants are added automatically when they join the multi-party 
conference. This embodiment can be used in server-client architecture or a peer-to-peer 
architecture. 

[0037] In a block 304, this embodiment of PE 102! displays the list obtained in 
block 302 in participant list area 204i (FIG. 2). As previously described, the list includes 
identifiers of the participants, which may be displayed in the form of text, graphics, 
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video, etc. In some embodiments, a participant's PE may be configured to omit 
displaying that participant's identifier. 

[0038] In decision block 306, PE 102i determines whether it has received a 
sourcing signal from one of PEs 102i-102 N . In one embodiment, one or more of sourcing 
signalers 106 r 106 N of PEs 102 r 102 N can send sourcing signals. As previously 
described, a PE sends a sourcing signal when its associated participant is providing 
information during the multi-party conference. In one embodiment, each sourcing signal 
provides the identifiers of participants providing information to the other participants in 
the multi-party conference. For example, a sourcing signal can be in the form of a packet 
sent over network 101, with the packet having a "sourcing" bit set to indicate the sender 
is providing information to the other participants. In other embodiments, the sourcing 
signals may have another form. In some embodiments, PE 102i may be configured to 
omit determining whether it receives a sourcing signal from itself. 

[0039] In still other embodiments, the sourcing signal may be "de-asserted" to 
indicate that a participant is no longer providing information during the multi-party 
conference. For example, when the sourcing signal is a packet, in some embodiments, a 
subsequent packet may be sent over network 101 with the "sourcing" bit reset when the 
participant is no longer providing information to the other participants. In another 
embodiment, a sourcing signal remains "asserted" until a sourcing signal from another 
participant is received. 

[0040] In a block 308, if a sourcing signal has been received, PE 102i provides 
an indication that the participant corresponding to the sourcing signal is providing 
information. In one embodiment, information source indicator 1101 causes the identifier 
to indicate that the participant associated with the identifier is providing the information. 
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As previously described, the indication may be causing the identifier to change 
appearance if the identifier is text (e.g., change font, size, color, become highlighted, 
bolded, underlined etc). If the identifier is not text, the indication can be to cause the 
identifier to have animation (e.g., move, flash, rotate, etc.), or change format type (e.g., 
change from an icon to a photograph or video, or from a photograph to video, etc.). In 
yet other embodiments, the indication may be displaying the identifier in a "providing 
information" area of the UL Other embodiments include displaying a graphic (e.g., a 
bullet, an arrow, a star, a speech cloud, etc.) or text (e.g., "speaking", "sourcing") near the 
identifier. Still another embodiment includes reordering the list of participants so that the 
participant currently providing information is at a designated position (e.g., at the top of 
the list). Other types of indications may also be used without departing from the spirit 
and scope of the present invention. 

[0041] In a block 310, PE 102! then determines if the multi-party conference is 
ended. If the multi-part conference is not ended, operational flow returns to block 306. 
Otherwise, the operational flow terminates. 

Overview of Example a PE with Audio Source Discrimination 

[0042] FIG. 4 illustrates a system 400 for source discrimination of an acoustic 
or audio signal, according to one embodiment. The terms acoustic signal and audio 
signal are used interchangeably herein, and refer to sound waves (i.e., compression 
waves) propagated in air (or other mediums). In this embodiment, system 400 includes a 
PE401 ls which includes a personal computer 402 j and a speakerphone 403 L for one 
participant of a multi-party conference. PE 40 l x is substantially similar to PE 102i 
(FIG. 1) in terms of hardware, except that PE 401! includes a microphone (or a connector 
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for an external microphone), whereas embodiments of PE 102i need not have a 
microphone. 

[0043] Other participants of the multi-party conference generally also have a 
PE having a computing device and speakerphone, which are omitted from FIG. 4 for 
clarity. The other PEs in this example embodiment are substantially similar to that 
shown in FIG. 4, although in system 400, a participant need not have a speakerphone to 
participate. 

[0044] In this embodiment, PE401i (and other PEs of system 400) are 
connected to network 101 and can transfer information via network 101 as described 
above in conjunction with FIGS. 1-3. However, in other embodiments, the audio source 
discrimination is not limited to multi-party conferencing applications; rather, the audio 
source discrimination can be used in any suitable application requiring discrimination 
between a substantially complete spectrum audio signal and a band-limited audio signal. 

[0045] In this embodiment, computer 402 x includes a sourcing signaler 406 1 
having an audio discriminator 412i, a microphone interface 4 14 U and previously 
described network interface 104!. In one embodiment, audio discriminator 412 u a 
microphone interface 414! , and network interface 104i are implemented as software 
modules or components executed by computer 402 1 . In addition, in some embodiments, 
computer 402i can include a UI 108i and sourcing signaler 106! as shown in FIG. L 
This embodiment is advantageously used in multi-party conferencing applications in 
which participants can communicate with each other via telephone. In such applications, 
this embodiment of system 400 allows the participants to know which participant is 
speaking over the telephone. 
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[0046] Further, in this embodiment, audio discriminator 41 2\ is designed to 
discriminate between speech that is spoken by a participant (also referred to herein as live 
speech) and speech from a speakerphone (also referred to herein as phone speech) in the 
presence of noise. Stated another way, in this context, live speech comprises acoustic 
signals generated by a person (e.g., the participant), whereas phone speech comprises 
acoustic signals generated by an audio transducer device. Audio discriminator 412! 
advantageously allows sourcing signaler 406j to distinguish between speech coming from 
its associated participant and speech coming from speakerphone 403 1 (i.e., when a 
different participant is sourcing information). In one embodiment, to discriminate 
between live and phone speech, audio discriminator 4 12 { detects differences in spectral 
content between live speech, phone speech, and external noise, which are illustrated 
below in FIGS. 5A-5C. 

[0047] FIG. 5 A illustrates a simplified example of the frequency range of live 
speech over time (i.e., a spectrogram). Most of the frequency content of live speech lies 
within a range of zero to 8 kHz. Thus, after low pass filtering at 8 kHz, a sampling rate 
of 16 kHz is adequate for live speech. Higher sampling rates can be used to obtain a 
larger frequency range. Vowels are typically at the lower frequency end of the frequency 
range (i.e., with most of its spectral range lying below 3.4 kHz. On the other hand, 
consonants (especially fricatives) are typically at the higher end of the frequency range 
(i.e., most of its spectral range lying above than 3.4 kHz). For example, band 501 
represents the frequency content over time of a vowel. As shown in FIG. 5A, this 
example vowel (i.e., band 501) ranges from zero to about 4 kHz with duration indicated 
as Ati. In contrast, band 502 represents the frequency content over time of a fricative. 
This example fricative ranges from about 3 kHz to about 8 kHz with duration indicated as 
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At 2 - Typically, Ati is larger than At 2 (Le. vowels tend to have a larger duration than 
consonants). 

[0048] FIG. 5B illustrates a simplified example spectrogram of phone speech. 
Before tiansmitting a voice signal over telephone lines, typical U.S. telephone systems 
low pass filter the voice signal to 3.4 kHz. Thus, most of the energy of a vowel (e.g., 
band 504) is passed through whereas the energy of a fricative (e.g., band 505) is almost 
completely filtered out. In addition, the relatively short durations between vowels and 
consonants and between syllables or words (described above for live speech) are 
substantially preserved in phone speech. Thus, one embodiment of audio 
discriminator 41 2 t detects whether the speech has frequency components greater than 
3.4 kHz. This embodiment may be practical for applications in which little or no external 
noise is received along with the speech. However, in a typical environment, external 
noise will also be received. 

[0049] FIG. 5C illustrates a simplified example spectrogram of phone speech in 
a noisy environment. Band 504 represents a vowel as previously described. Band 510 
represents a fricative with noise occurring at the same time. Thus, in this example, 
band 510 is the combination of band 505 (FIG. 5B) and the band representing the spectral 
content of the noise. Band 512 represents the spectral content of some other noise. The 
noise for bands 510 and 512 may come from a variety of sources such as, for example, 
ruffling papers, typing on a keyboard, fans, bumping or knocking into furniture, etc. 
Because the noise is generally independent of the speech, the time gaps between noise 
and speech may be relatively long, as indicated by an arrow 514. 

[0050] Although "narrow-band" telephone system characteristics are described 
above in conjunction with FIG. 5B, other embodiments of audio discriminator 41 2 { 
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(FIG. 4) can be designed for use with wide-band telephone systems (which do not limit 
phone speech to 3.4 kHz). In particular, although supporting a frequency range that is 
closer to that of live speech, the spectral characteristics of wide-band telephone speech 
are different from live speech. Thus, in embodiments directed for use in wide-band 
telephone applications, audio mscriminator 412! can be designed to detect the differences 
in spectral content between live speech, wide-band phone speech, and external noise. 

[0051] FIG. 6 illustrates operational flow of system 400 (FIG. 4) in sending a 
sourcing signal, according to one embodiment. This operational flow loops until the 
multi-party conference terminates. Referring to FIGS. 4-6, this embodiment's 
operational flow is described below. 

[0052] In a block 602, computer 402j receives a frame of audio data. In this 
embodiment, the audio data are samples of audio signals detected by microphone 414 1? 
which it then converts to an electrical signal. In one embodiment, audio 
discriminator 4 12 1 samples the electrical signal from microphone 41 4i at a rate of at 
16 kHz, although a rate over 16 kHz may be used in other embodiments. A frame, in this 
embodiment, has 512 samples. In other embodiments, different frame sizes can be used. 

[0053] In a block 604, this embodiment of audio discriminator 412 1 classifies 
the received frame using the frame's spectral data. In one embodiment, audio 
discriminator 41 2 t processes the frame to obtain the spectral data. Then, this 
embodiment of audio discriminator 412 u in effect, compares the spectral data to the 
spectrograms of FIGS. 5A-5C to determine whether the frame was taken from live speech 
or phone speech. 

[0054] In a block 606, sourcing signaler 406 , determines the source of the 
audio signal based on the frame classification of block 604 and past determinations. In 
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one embodiment, audio discriminator 412 1 determines whether the source of the audio 
signal is live speech or phone speech. In other embodiments, sourcing signaler 406 1 may 
determine that the source of the audio signal falls into one or more other categories 
(unknown/not sure, silence, noise, etc.). Block 606 is different from block 604 in that 
block 604 relates to frame classification rather than determining the source of the audio 
signal. For example, sourcing signaler 406 j may require several frames before it can 
determine whether the source of an audio signal is live speech or phone speech. 

[0055] In decision block 608, sourcing signaler 406 2 checks whether in 
block 606 the source of the audio signal is live speech. If the source was determined to 
be live speech, operational flow proceeds to a block 610. 

[0056] In block 610, sourcing signaler 406! sends a sourcing signal to 
network 101 as previously described. Then in a decision block 612, sourcing 
signaler 406 x checks whether the multi-party conference has terminated before returning 
to block 602 to receive another frame of audio data. If the multi-party conference has 
terminated, operational flow of this aspect of system 400 ends. Similarly, if in block 608 
the source of the audio signal was not live speech, operational flow proceeds directly to 
decision block 612. 

[0057] FIG. 7 illustrates an embodiment of audio source discriminator 4 12\ 
(FIG. 4). In this embodiment, audio source discriminator 412! includes a spectrum 
analyzer 702, a frame classifier 704, and an audio source finite state machine 706 (also 
referred to herein as FSM 706). In one embodiment, spectrum analyzer 702, frame 
classifier 704, and FSM 706 are implemented as software modules or components that 
can be executed by computer 402 { (FIG. 4). Operation of this embodiment of audio 
source discriminator 412i is described in conjunction with FIG. 8. 
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[0058] FIG. 8 illustrates operational flow of audio source discriminator 412 1 
(FIG. 7) in determining the source of an audio signal, according to one embodiment. In a 
block 802, audio source discriminator 412i performs a frequency transform operation on 
the received frame (see block 602 of FIG. 6) to obtain spectral data of the frame. In this 
embodiment, spectrum analyzer 702 of audio source discriminator 4 12j performs an fast 
Fourier transform (FFT) algorithm to determine the spectrum of the frame in the range of 
zero to 8 kHz. 

[0059] In alternative embodiments, audio source discriminator 4 12 X can obtain 
the spectral data using other techniques. For example, in one embodiment, audio source 
discriminator 412 ! uses a modulated complex lapped transform (MCLT) algorithm to 
determine the spectrum of the audio signal. 

[0060] In a block 804, audio source discriminator 412! classifies the frame into 
one of a set of frame types. In one embodiment, frame classifier 704 of audio source 
discriminator 412i classifies the frame into one of three frame-types; namely, a live-type, 
a phone-type, or an unsure-type. In other embodiments, the set of frame types may be 
different. One embodiment of frame classifier 704 is described in more detail below in 
conjunction with FIGS. 9 and 10. 

[0061] In a block 806, audio source discriminator 4 12i determines the next 
state of FSM 706. In this embodiment, FSM 706 has a phone state, a live state, and an 
unsure state based on the frame-type of the current frame (see block 804) and the current 
state of FSM. The next state of FSM 706 defines how audio source discriminator 412! 
has determined the source of a detected audio signal. Thus, if the next state of FSM 706 
is the live state, then audio source discriminator 412i has determined that the audio signal 
source is live speech (i.e., the participant is speaking). But if the next state of FSM 706 is 
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the phone state, then audio source discriminator 412 \ has determined that the audio signal 
source is speakerphone 403 x (FIG. 4). Finally, in this embodiment, if the next state of 
FSM 706 is the unsure state, then audio source discriminator 412 x cannot determine the 
source of the audio signal. One embodiment of FSM 706 is described in more detail 
below in conjunction with FIGS. 12-16. 

[0062] Although a Moore FSM embodiment is described above, in other 
embodiments different types of machines or algorithms can be used to determine the 
source of the audio signal. For example, hidden Markov model (HMM) machine can be 
used in another embodiment. 

[0063] FIG. 9 illustrates parameters generated and/or used by frame 
classifier 704 (FIG. 7) in classifying frames, according to one embodiment. In this 
embodiment, frame classifier 704 generates several parameters used in classifying frames 
from spectral data collected from spectrum analyzer 702 (FIG. 7). The parameters 
include high band noise floor energy (E Nhb ) 903, low band noise floor energy (E Nlb ) 905, 
frame high band energy (E Fhb ) 911, frame low band energy (E Flb ) 913 and a ratio 915 of 
the frame high band energy to the frame low band energy (E Fhb /E Flb ). In addition, frame 
classifier 704 uses two more parameters that, in this embodiment, need not be generated 
from frame spectral data; namely, an energy ratio threshold (TH LIV e) 917 for live speech, 
and an energy ratio threshold (TH PHO ne) 919 for phone speech. Thresholds TH U ve 917 
and TH PH one919 may be predetermined empirically (e.g., using training data). For 
example, in one embodiment, TH LIV e917 and TH PH one919 are two and twenty, 
respectively. In other embodiments, other suitable values may be used for TH LIV e917 
and TH PHO ne919. 
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[0064] In one embodiment, the low band is defined as 100 Hz to 3.4 kHz, and 
the high band is defined as 3.4 kHz to 8 kHz. Other ranges can be used in other 
embodiments. E Nrb 903 and E Nlb 905 are dynamically tracked using standard noise floor 
tracking techniques such as, for example, median-based noise floor tracking (MNFT) 
techniques. In some embodiments, predetermined default values can be used until a 
sufficient number of frames have been processed to determine the noise floor values. 

[0065] FIG. 10 illustrates operational flow of frame classifier 704 (FIG. 7), 
according to one embodiment. In a block 1002, frame classifier 704 determines whether 
the frame contains possible speech (live speech or phone speech) samples. In one 
embodiment, frame classifier 704 performs this operation by analyzing the spectral data 
from spectrum analyzer 702 (FIG. 7). 

[0066] For example, in one embodiment, frame classifier 704 determines the 
values of Ep HB 911, E Flb 913 and ratio E Fhb /E Flb 915 from the frame's spectral data. In 
this embodiment, if E Fhb 911 is greater than E Nhb 903, or E Flb 913 is greater than 
E Nlb 905 (i.e., the frame energy is above the noise floor), then the frame is deemed to 
contain speech. 

[0067] In a decision block 1004, frame classifier 704 checks whether the frame, 
as deteimined in block 1002, contains speech. In not, the frame likely contains data of a 
silent period and operational flow for processing this frame terminates. This frame can 
then be used to calculate the noise floors E NuD 903 and E NlD 905. If in block 1002 the 
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frame was deemed to contain speech, the operational flow proceeds to a block 1006. 

[0068] In block 1006, frame classifier 704 determines ratio 915 from the 
previously determined values of E Fhb 911 and E Flb 913. As previously described, 
consonants (especially fricatives) of live speech will typically have some high band 
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energy, resulting in ratio 915 being greater than zero. In the case of consonants, ratio 915 
will be significantly greater than zero. The operational flow then proceeds to a decision 
block 1008. 

[0069] In decision block 1008, frame classifier 704 determines whether 
ratio 915 is greater than TH LIV e threshold 917. If ratio 915 is greater than TH LI ve 
threshold 917, then in a block 1010, frame classifier 704 classifies the frame as a live- 
type frame. If not, the operational flow proceeds to a decision block 1012. 

[0070] In decision block 1012, frame classifier 704 determines whether 
ratio 915 is less than TH PH one threshold 919. As previously described, speech from a 
speakerphone is band limited to 3.4 kHz, resulting in ratio 915 being equal to or near 
zero. If ratio 915 is less than TH PH one threshold 919, then in a block 1014, frame 
classifier 704 classifies the frame as a phone-type frame. If ratio 915 is greater than 
THphone threshold 919, then in a block 1016 frame classifier 704 classifies the frame as 
an unsure-type frame. As previously mentioned, thresholds 917 and 919 can be learned 
from training data. Frame classifier 704 can then return to block 1002 to classify a next 
frame. 

[0071] FIG. 11 illustrates operational flow of an alternative embodiment of 
block 1010 (FIG. 10) in classifying a frame as a live-type frame. In this alternative 
embodiment, frame classifier 704 (FIG. 7) performs a further processing step before 
classifying a frame as a live-type frame. FIGS 11A and 1 IB illustrate simplified 
spectrograms of examples of live speech and external noise, respectively. 

[0072] As previously described, to get to block 1010, ratio 915 has already 
been determined to be greater than TH l1V e threshold 917. In a block 1102, frame 
classifier 704 compares the distribution of low-band (i.e., where Ef ut VEf tt , is near zero) 
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and high-band frames (i.e., where E Fhb /E Flb is relatively large) to a predetermined 
distribution. 

[0073] In one embodiment, frame classifier 704 compares the distribution of 
low-band and high band frames in the previous M frames to a distribution of live speech 
derived from training. In one embodiment, the training is done during the design phase. 
If the distributions are similar, then it is likely that the current frame is a live speech 
frame. In one example embodiment, frame classifier 704 is configured to compare the 
distributions by determining the number of low-band and high-band frames in the 
previous M frames, and then comparing these numbers to thresholds derived from the 
training. These thresholds can define a range of the number of low-band frames and a 
range of the number of high-band frames in the previous M frames. The operational flow 
then proceeds to a block 1 104. 

[0074] In decision block 1 104, if the distributions match, then the operational 
flow proceeds to a block 1 106. Continuing the example embodiment described above, if 
the numbers of low-band and high-band frames meet the aforementioned ranges, then in 
block 1106 frame classifier 704 classifies the current frame as a live-type frame. 
However, if the numbers of low-band and high-band frames do not fall in the ranges, 
frame classifier 704 classifies the current frame as an unsure-type frame. 

[0075] In an alternative embodiment, frame classifier 704 can be configured to 
determine whether the previous frame contained speech {e.g., as determined in 
block 1002 of FIG. 10 when frame classifier 704 classified the previous frame). In one 
embodiment, if there is no previous frame {e.g., the current frame is the fir st frame of the 
session), then the default determination is that the "previous" frame did not contain 
speech. 
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[0076] As previously described in conjunction with FIGS. 5 A- 5C, time gaps 
between sounds in live speech tend to be relatively short, whereas time gaps between 
noise and speech may be relatively long. Thus, as shown in the live speech example of 
FIG. 11 A, each of frames 1112-1114 contains speech. However, as shown in the external 
noise example of FIG. 11B, frames 1122 and 1124 contain "speech" (i.e., spectral data 
that frame classifier 704 would classify as speech), whereas frames 1121 and 1123 would 
not be classified as speech. 

[0077] Thus, if the previous frame contained speech, then frame classifier 704 
would classify the current frame as a live-type frame in this alternative embodiment. 
However, if the previous frame did not contain speech, frame classifier 704 would 
classify the current frame as an unsure-type frame. 

[0078] FIG. 12 illustrates a state diagram of FSM 706 (FIG. 7), according to 
one embodiment. This embodiment of FSM 706 includes a phone state 1201, a live 
state 1202, and an unsure state 1203. FIG. 13 schematically illustrates how an 
embodiment of FSM 704 (FIG. 7) transitions from a current state to a next state as a 
function of various parameters and thresholds. As shown in FIG. 13, the current state of 
FSM 706 is represented as current state 1302 and the next state is represented by next 
state 1304. In this embodiment, FSM 704 generates the following parameters: a 
Current JFrameJTime 1305; a Last_Speech_Time 1306; a Last_Live_Time 1308; a 
Phone_Count 1310, a LiveCount 1312, and a Cumu Count 1314. Threshold parameters 
include a LiveCount threshold 1316, a PhoneCount threshold 1318, a Last_Live_Time 
threshold 1320, a Last_Speech_Time threshold 1322, and a Cumu_Count threshold 1324. 
These thresholds need not be generated by FSM 704 and may be predeteimined 
empirically (e.g., using training data). 
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[0079] Current Frame Time 1305 has a value representing the time stamp of 
the current frame. LastSpeechTime 1306 has a value that represents the time stamp of 
the most recent frame classified as either live-type or phone-type. LastLiveTime 1308 
has a value that represents the time stamp of the most recent frame classified as a live- 
type frame. PhoneCount 1310 has a value representing the number of the last L frames 
classified as phone-type frames. Live_Count 1312 has a value representing the number 
of the last L frames classified as live-type frames. CumuCount 13 14 has a value related 
to the number of frames since the last live-type frame. For example, in one embodiment, 
Cumu Count 1314 when reset has a value of twenty. In this example, if the subsequent 
frame is a not a live-type frame, CumuCount 1314 is decreased by some number, 
whereas if the subsequent frame is a live-type frame, Cumu Count 1314 is reset. 
Referring to FIGS. 12 and 13, FSM 706 transitions from one state to another as follows. 

[0080] From phone state 1201, FSM 706 can transition to live state 1202 when 
the current frame has been classified as a live-type frame and parameter 
Live Count 1312 is greater than Live Count threshold 1316. FSM 706 can also 
transition from phone state 1201 to unsure state 1203 when the current frame has been 
classified as a live-type frame and parameter Last Speech Time 1306 is greater than 
Last_Speech_Time threshold 1322. Transitions from phone state 1201 are described 
further in conjunction with FIG. 14, for one embodiment of FSM 706. 

. [0081] From live state 1202, FSM 706 can transition to phone state 1201 when 
the current frame has been classified as a phone-type frame and parameter 
Cumu_Count 1314 is less than Cumu Count threshold 1324. FSM 706 can also 
transition from live state 1202 to unsure state 1203 when the current frame has been 
classified as a phone-type frame, parameter Cumu Count 1314 is greater than 
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Cumu Count threshold 1324, and parameter LastJLive_Time 1308 is greater than 
Last_Live_Time threshold 1320. Transitions from live state 1202 are described further in 
conjunction with FIG. 15, for one embodiment of FSM 706. 

[0082] From unsure state 1203, FSM 706 can transition to phone state 1201 
when parameter PhoneCount 1310 of is greater than Phone_Count threshold 1318. 
FSM 706 can transition from unsure state 1203 to live state 1202 when parameter 
Live_Count 1312 is less than LiveCount threshold 1316. Transitions from unsure 
state 1203 are described further in conjunction with FIG. 16, for one embodiment of 
FSM 706. 

[0083] FIG. 14 illustrates operational flow of FSM 706 (FIG. 7) in determining 
its next state from phone state 1201, according to one embodiment. Referring to 
FIGS. 12-14, FSM 706 operates as follows in determining its next state from phone 
state 1201. 

[0084] Starting with FSM 706 having a cuiTent state 1302 of phone state 1201, 
in a block 1402, FSM 706 determines whether the current frame is a live-type frame. In 
this embodiment, FSM 706 gets this information from previously described frame 
classifier 704 (FIG. 7). If the frame type is not a live-type frame, the operational flow 
proceeds to a block 1404 in which FSM 706 causes the next state 1304 to be phone 
state 1201 (i.e., there is no state transition in FSM 706). 

[0085] However, if in block 1402 FSM 704 finds that the current frame is a 
live-frame type, in a block 1406 FSM 706 compares parameter Live_Count 1312 with 
Live_Count threshold 1316. If Live Count 1312 is greater than or equal to LiveCount 
threshold 1316, in a block 1408 FSM 706 causes next state 1304 to be live state 1202. 
The rationale for this operation is that FMS 706 will wait for a certain number of live- 
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type frames before transitioning from phone state 1201 to live state 1202 to help ensure 
that the speech is really live speech and not phone speech combined with external noise. 

[0086] On the other hand, if LiveCount 1312 is less than LiveCount 
threshold 1316 in block 1406, in a decision block 1410, FSM706 determines whether 
parameter Last_Speech_Time 1306 is greater than Last_Speech_Time threshold 1322. If 
Last_Speech_Time 1306 is greater than or equal to Last_Speech_Time threshold 1322, 
then FSM 706 causes next state 1304 to be unsure state 1203 in a block 1412. The 
rationale for this operation is that because the last speech (either live speech or phone 
speech) occurred a relatively long time ago, FSM 706 and "suddenly" a live-type frame is 
received, it is no longer clear what kind of speech is being detected. 

[0087] However, if in block 1410 LastSpeechTime 1306 is less than 
Last_Speech_Time threshold 1322, FSM 706 causes next state 1304 to be phone 
state 1201 (i.e., proceeds to block 1404). The rationale for this operation is that because 
the last speech (either live speech or phone speech) occurred a relatively short time ago, 
the current live-type frame is probably really speech; however, because not enough live- 
type frames have occurred (i.e., block 1406), FSM 704 remains in phone state 1201. 

[0088] FIG. 15 illustrates operational flow of FSM 706 (FIG. 7) in determining 
its next state 1304 (FIG. 13) from live state 1202, according to one embodiment. 
Referring to FIGS. 12, 13 and 15, FSM 706 operates as follows in deteimining its next 
state from live state 1202. 

[0089] Starting with FSM 706 having a current state 1302 of live state 1202, in 
a block 1502, FSM 706 determines whether the current frame is a live-type frame. In this 
embodiment, FSM 706 gets this information from previously described frame 
classifier 704 (FIG. 7). If the frame type is a live-type frame, the operational flow 
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proceeds to a block 1504 in which FSM 706 resets parameter Cumu_Count 1314. Then 
in a block 1506, FSM 706 causes the next state 1304 to be live state 1202 (i.e., there is no 
state transition in FSM 706). 

[0090] However, if in block 1502 the current frame is not a live-type frame, 
FSM 706 determines whether the current frame is a phone-type frame in a decision 
block 1508. If the current frame is not a phone-type frame, FSM 706 decreases 
parameter Cumu_Count 1314 in a block 1510. This operation, in effect, allows 
Cumu Count to keep track of a "confidence level" of the most recent live-type frame. 
That is, because the current frame is neither a live-type frame nor a phone-type frame, the 
confidence in the classification of the most recent live-type frame's classification should 
be reduced. 

[0091] From block 15 10, the operational flow proceeds to block 1506, in which 
FSM 706 again causes next state 1304 to be live state 1202. The rationale for this 
operation is that even though the current frame is neither a live-type nor phone-type 
frame, because the current state is live state 1202, the participant is likely to still be 
speaking. For example, the frame could have been taken from a period of silence 
between words, or at a point in which some out-of-phase noise happened to cancel out 
some of the live speech. In such a case, next state 1304 should be the same as current 
state 1302 (i.e., remain in live state 1202). However, if in block 1508 the current frame is 
a phone-type frame, the operational flow proceeds to a decision block 1512. 

[0092] In decision block 1512, FSM 706 determines whether the difference 
between parameters Current_Frame_Time 1305 and LastJLiveJTime 1308 is greater than 
or equal to Last_Live_Time threshold 1320. If not (i.e., the time since the last live-type 
frame was relatively recent), operational flow proceeds to block 1506. The rationale for 
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this operation is that if the time since the last live-type frame is relatively recent, then it 
could be that current frame was really a live-type frame that was mistakenly classified as 
a phone type frame (e.g. the frame contained a vowel). In this case, next state 1304 
should be the same as current state 1302 (i.e., remain in live state 1202). 

[0093] However, if in decision block 1512, the difference between parameters 
Current_Frame_Time 1305 and LastJLive_Time 1308 is less than Last_Live_Time 
threshold 1320 (i.e., the time since the last live-type frame was relatively long ago), 
FSM 706 decreases parameter Cumu_Count 1314 in a block 1514. In one embodiment, 
FSM 706 decreases CumuCount at a faster rate than in block 1510. The rationale of this 
operation is that because the current frame was classified as a phone type frame and the 
most recent live-type frame occurred a relatively long time ago, there should be less 
confidence that the most recent live-type frame was correctly classified. 

[0094] In a decision block 1516, FSM 706 then determines whether parameter 
Cumu_Count 1314 is greater than or equal to Cumu_Count threshold 1324. In one 
embodiment, Cumu Count threshold 1324 is set to zero. If Cumu Count 1314 is greater 
than or equal to CumuCount threshold 1324, then the operational flow proceeds to a 
block 1518. In block 1518, FSM 706 causes next state 1304 to be in unsure state 1203. 
In this case, there is some confidence that the most recent live-type frame was correctly 
classified as live-speech, but because the last live-type frame was long ago, FSM 706 can 
no longer be sure that next state 1204 should be live state 1202. 

[0095] However, if in block 1516 Cumu Count 1314 is less than Cumu Count 
threshold 1324, then FSM 706 in a block 1520 causes next state 1304 to be in phone 
state 1201. Because there is, in effect, no confidence that the most recent live-type frame 
(which occurred a relatively long time ago) was correctly classified as live-type, 

26 



FSM 706 treats the current frame (/>., phone-type) as the correct frame type. Thus, FSM 
causes next state 1304 to be phone state 1201. 

[0096] FIG. 16 illustrates operational flow in FSM 706 (FIG. 7) in determining 
its next state 1304 (FIG. 13) from unsure state 1203 (FIG. 12). Referring to FIGS. 12, 13 
and 16, FSM 706 operates as follows in determining its next state from unsure state 1203. 

[0097] Starting with FSM 706 having a current state 1302 of unsure state 1203, 
in a block 1602, FSM 706 determines whether the current frame is a live-type frame. In 
this embodiment, FSM 706 gets this information from previously described frame 
classifier 704 (FIG. 7). If the frame type is a live-type frame, the operational flow 
proceeds to a decision block 1604 in which FSM 706 determines whether parameter 
Live_Count 1312 is greater than or equal to Live_Count threshold 1316. 

[0098] If in decision block 1604 Live_Count 1312 is greater than or equal to 
Live_Count threshold 1316, the operational flow proceeds to a block 1606. In 
block 1606, then FSM 706 causes next state 1304 to be in live state 1202. This operation 
reflects the fact that including the current frame, there are enough live-type frames in the 
last K frames to be confident that live speech is really being detected. 

[0099] However, if in decision block 1604 Live_Count 1312 is less than 
Live_Count threshold 1316, the operational flow proceeds to a block 1608. In 
block 1608, FSM 706 causes next state 1304 to be in unsure state 1203. This operation 
reflects the fact that there have not been enough live-type frames to transition to live 
state 1202 from unsure state 1203. 

[0100] Referring back to decision block 1602, if the current fr ame is not a live- 
type frame, the operational flow proceeds to a decision block 1610. In decision 
block 1610, FSM 706 determines whether the current frame is a phone-type frame. If the 
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current frame is not a phone-type frame, the operational flow proceeds to block 1608. In 
this embodiment, if the current frame is neither a live-type frame nor a phone-type frame, 
then it is an unsure-type frame. Thus, is the current state is unsure state 1203 and the 
current frame is an unsure-type frame, then next state 1304 should also be the unsure 
state. 

[0101] However, if in decision block 1610 the current frame is a phone-type 
frame, the operational flow proceeds to a decision block 1612. In decision block 1612, 
FSM706 determines whether parameter Phone_Count 1310 is greater than or equal to 
Phone_Count threshold 1318. 

[0102] If in decision block 1612 Phone_Count 1310 is greater than or equal to 
Phone_Count threshold 1318, the operational flow proceeds to a block 1614. In 
block 1614, then FSM 706 causes next state 1304 to be in phone state 1201. This 
operation reflects the fact that including the current frame, there are enough phone-type 
frames in the last K frames to be confident that phone speech is really being detected. 

[0103] However, if in decision block 1612 PhoneCount 13 10 is greater than or 
equal to Phone_Count threshold 1318, the operational flow proceeds to block 1608. As 
previously described, block 1608 causes next state 1304 to be in phone state 1201. This 
operation reflects the fact that there have not been enough phone-type frames to transition 
to phone state 1201 from unsure state 1203. 

[0104] FIG. 17 illustrates a general computer environment 1700, which can be 
used to implement the techniques described herein. The computer environment 1700 is 
only one example of a computing environment and is not intended to suggest any 
limitation as to the scope of use or functionality of the computer and network 
architectures. Neither should the computer environment 1700 be interpreted as having 
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any dependency or requirement relating to any one or combination of components 
illustrated in the example computer environment 1700. 

[0105] Computer environment 1700 includes a general-purpose computing 
device in the form of a computer 1702. The components of computer 1702 can include, 
but are not limited to, one or more processors or processing units 1704, system 
memoiy 1706, and system bus 1708 that couples various system components including 
processor 1704 to system memory 1706. 

[0106] System bus 1708 represents one or more of any of several types of bus 
structures, including a memory bus or memory controller, a peripheral bus, an accelerated 
graphics port, and a processor or local bus using any of a variety of bus architectures. By 
way of example, such architectures can include an Industry Standard Architecture (ISA) 
bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video 
Electronics Standards Association (VESA) local bus, a Peripheral Component 
Interconnects (PCI) bus also known as a Mezzanine bus, a PCI Express bus, a Universal 
Serial Bus (USB), a Secure Digital (SD) bus, or an IEEE 1394, i.e., Fire Wire, bus. 

[0107] Computer 1702 may include a variety of computer readable media. 
Such media can be any available media that is accessible by computer 1702 and includes 
both volatile and non- volatile media, removable and non-removable media. 

[0108] System memory 1706 includes computer readable media in the form of 
volatile memoiy, such as random access memoiy (RAM) 1710; and/or non- volatile 
memory, such as read only memoiy (ROM) 1712 or flash RAM. Basic input/output 
system (BIOS) 1714, containing the basic routines that help to transfer information 
between elements within computer 1702, such as during start-up, is stored in ROM 1712 
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or flash RAM. RAM 1710 typically contains data and/or program modules that are 
immediately accessible to and/or presently operated on by processing unit 1704. 

[0109] Computer 1702 may also include other removable/non-removable, 
volatile/non-volatile computer storage media. By way of example, FIG. 17 illustrates 
hard disk drive 1716 for reading from and writing to a non-removable, non- volatile 
magnetic media (not shown), magnetic disk drive 1718 for reading from and writing to 
removable, non- volatile magnetic disk 1720 (e.g., a "floppy disk"), and optical disk 
drive 1722 for reading from and/or writing to a removable, non-volatile optical disk 1724 
such as a CD-ROM, DVD-ROM, or other optical media. Hard disk drive 1716, magnetic 
disk drive 1718, and optical disk drive 1722 are each connected to system bus 1708 by 
one or more data media interfaces 1725. Alternatively, hard disk drive 1716, magnetic 
disk drive 1718, and optical disk drive 1722 can be connected to the system bus 1708 by 
one or more interfaces (not shown). 

[0110] The disk drives and then associated computer-readable media provide 
non-volatile storage of computer readable instructions, data structures, progr am modules, 
and other data for computer 1702. Although the example illustrates a hard disk 1716, 
removable magnetic disk 1720, and removable optical disk 1724, it is appreciated that 
other types of computer readable media which can store data that is accessible by a 
computer, such as magnetic cassettes or other magnetic storage devices, flash memory 
cards, CD-ROM, digital versatile disks (DVD) or other optical storage, random access 
memories (RAM), read only memories (ROM), electrically erasable programmable read- 
only memory (EEPROM), and the like, can also be utilized to implement the example 
computing system and environment. 
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[0111] Any number of program modules can be stored on hard disk 1716, 
magnetic disk 1720 ? optical disk 1724, ROM 1712, and/or RAM 1710, including by way 
of example, operating system 1726, one or more application programs 1728, other 
program modules 1730, and program data 1732. Each of such operating system 1726, 
one or more application programs 1728, other program modules 1730, and program 
data 1732 (or some combination thereof) may implement all or part of the resident 
components that support the distributed file system. 

[0112] A user can enter commands and information into computer 1702 via 
input devices such as keyboard 1734 and a pointing device 1736 (e.g., a "mouse"). Other 
input devices 1738 (not shown specifically) may include a microphone, joystick, game 
pad, satellite dish, serial port, scanner, and/or the like. These and other input devices are 
connected to processing unit 1704 via input/output interfaces 1740 that are coupled to 
system bus 1708, but may be connected by other interface and bus structures, such as a 
parallel port, game port, or a universal serial bus (USB). 

[0113] Monitor 1742 or other type of display device can also be connected to 
the system bus 1708 via an interface, such as video adapter 1744. In addition to 
monitor 1742, other output peripheral devices can include components such as speakers 
(not shown) and printer 1746, which can be connected to computer 1702 via I/O 
interfaces 1740. 

[0114] Computer 1702 can operate in a networked environment using logical 
connections to one or more remote computers, such as remote computing device 1748. 
By way of example, remote computing device 1748 can be a PC, portable computer, a 
server, a router, a network computer, a peer device or other common network node, and 
the like. Remote computing device 1748 is illustrated as a portable computer that can 
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include many or all of the elements and features described herein relative to 
computer 1702. Alternatively, computer 1702 can operate in a non-networked 
environment as well. 

[0115] Logical connections between computer 1702 and remote computer 1748 
are depicted as a local area network (LAN) 1750 and a general wide area network 
(WAN) 1752. Such networking environments are commonplace in offices, enterprise- 
wide computer networks, intranets, and the Internet. 

[0116] When implemented in a LAN networking environment, computer 1702 
is connected to local network 1750 via network interface or adapter 1754. When 
implemented in a WAN networking environment, computer 1702 typically includes 
modem 1756 or other means for establishing communications over wide network 1752. 
Modem 1756, which can be internal or external to computer 1702, can be connected to 
system bus 1708 via I/O interfaces 1740 or other appropriate mechanisms. It is to be 
appreciated that the illustrated network connections are examples and that other means of 
establishing at least one communication link between computers 1702 and 1748 can be 
employed. 

[0117] In a networked enviionment, such as that illustrated with computing 
environment 1700, program modules depicted relative to computer 1702, or portions 
thereof, may be stored in a remote memoiy storage device. By way of example, remote 
application programs 1758 reside on a memory device of remote computer 1748. For 
purposes of illustration, applications or programs and other executable program 
components such as the operating system are illustrated herein as discrete blocks, 
although it is recognized that such programs and components reside at various times in 
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different storage components of computing device 1702, and are executed by at least one 
data processor of the computer. 

[0118] Various modules and techniques may be described herein in the general 
context of computer-executable instructions, such as program modules, executed by one 
or more computers or other devices. Generally, program modules include routines, 
programs, objects, components, data structures, etc. for performing particular tasks or 
implement particular abstract data types. Typically, the functionality of the program 
modules may be combined or distributed as desired in various embodiments. 

[0119] An implementation of these modules and techniques may be stored on 
or transmitted across some form of computer readable media. Computer readable media 
can be any available media that can be accessed by a computer. By way of example, and 
not limitation, computer readable media may comprise "computer storage media" and 
"communications media." 

[0120] "Computer storage media" includes volatile and non-volatile, removable 
and non-removable media implemented in any method or technology for storage of 
information such as computer readable instructions, data structures, program modules, or 
other data. Computer storage media includes, but is not limited to, RAM, ROM, 
EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks 
(DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage 
or other magnetic storage devices, or any other medium which can be used to store the 
desir ed information and which can be accessed by a computer. 

[0121] "Communication media" typically embodies computer readable 
instructions, data structures, program modules, or other data in a modulated data signal, 
such as carrier wave or other transport mechanism. Communication media also includes 

33 



any information delivery media. The term "modulated data signal" means a signal that 
has one or more of its characteristics set or changed in such a maimer as to encode 
information in the signal. As a non-limiting example only, communication media 
includes wired media such as a wired network or direct-wired connection, and wireless 
media such as acoustic, RF, infrared, and other wireless media. Combinations of any of 
the above are also included within the scope of computer readable media. 

[0122] Reference has been made throughout this specification to "one 
embodiment," "an embodiment," or "an example embodiment" meaning that a particular 
described feature, structure, or characteristic is included in at least one embodiment of the 
present invention. Thus, usage of such phrases may refer to more than just one 
embodiment. Furthermore, the described features, structures, or characteristics may be 
combined in any suitable manner in one or more embodiments. 

[0123] One skilled in the relevant art may recognize, however, that the 
invention may be practiced without one or more of the specific details, or with other 
methods, resources, materials, etc. In other instances, well known structures, resources, 
or operations have not been shown or described in detail merely to avoid obscuring 
aspects of the invention. 

[0124] While example embodiments and applications have been illustrated and 
described, it is to be understood that the invention is not limited to the precise 
configuration and resources described above. Various modifications, changes, and 
variations apparent to those skilled in the art may be made in the arrangement, operation, 
and details of the methods and systems of the present invention disclosed herein without 
departing from the scope of the claimed invention. 
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